203 research outputs found
Regularization for Cox's proportional hazards model with NP-dimensionality
High throughput genetic sequencing arrays with thousands of measurements per
sample and a great amount of related censored clinical data have increased
demanding need for better measurement specific model selection. In this paper
we establish strong oracle properties of nonconcave penalized methods for
nonpolynomial (NP) dimensional data with censoring in the framework of Cox's
proportional hazards model. A class of folded-concave penalties are employed
and both LASSO and SCAD are discussed specifically. We unveil the question
under which dimensionality and correlation restrictions can an oracle estimator
be constructed and grasped. It is demonstrated that nonconcave penalties lead
to significant reduction of the "irrepresentable condition" needed for LASSO
model selection consistency. The large deviation result for martingales,
bearing interests of its own, is developed for characterizing the strong oracle
property. Moreover, the nonconcave regularized estimator, is shown to achieve
asymptotically the information bound of the oracle estimator. A coordinate-wise
algorithm is developed for finding the grid of solution paths for penalized
hazard regression problems, and its performance is evaluated on simulated and
gene association study examples.Comment: Published in at http://dx.doi.org/10.1214/11-AOS911 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Nonparametric tests of the Markov hypothesis in continuous-time models
We propose several statistics to test the Markov hypothesis for
-mixing stationary processes sampled at discrete time intervals. Our
tests are based on the Chapman--Kolmogorov equation. We establish the
asymptotic null distributions of the proposed test statistics, showing that
Wilks's phenomenon holds. We compute the power of the test and provide
simulations to investigate the finite sample performance of the test statistics
when the null model is a diffusion process, with alternatives consisting of
models with a stochastic mean reversion level, stochastic volatility and jumps.Comment: Published in at http://dx.doi.org/10.1214/09-AOS763 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification
We propose a high dimensional classification method that involves
nonparametric feature augmentation. Knowing that marginal density ratios are
the most powerful univariate classifiers, we use the ratio estimates to
transform the original feature measurements. Subsequently, penalized logistic
regression is invoked, taking as input the newly transformed or augmented
features. This procedure trains models equipped with local complexity and
global simplicity, thereby avoiding the curse of dimensionality while creating
a flexible nonlinear decision boundary. The resulting method is called Feature
Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by
generalizing the Naive Bayes model, writing the log ratio of joint densities as
a linear combination of those of marginal densities. It is related to
generalized additive models, but has better interpretability and computability.
Risk bounds are developed for FANS. In numerical analysis, FANS is compared
with competing methods, so as to provide a guideline on its best application
domain. Real data analysis demonstrates that FANS performs very competitively
on benchmark email spam and gene expression data sets. Moreover, FANS is
implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
Modeling Nonlinear Vector Time Series Data
In this chapter, we review nonlinear models for vector time series data and develop new nonparametric estimation and inference for them. Vector time series data exist widely in practice. In financial markets, multiple time series are usually correlated. When analyzing several interdependent time series, in general one should consider them as a single vector time series fitted by multivariate models, which provides a useful tool for modeling interdependencies among multiple time series and for simultaneously analyzing feedback and Granger causality effects. Since nonlinear features are widely observed in time series, we consider nonlinear methodology for modeling nonlinear vector time series data, which allows flexibility in the model structure and avoids the curse of dimensionality
The global mean sea surface model WHU2013
AbstractThe mean sea surface (MSS) model is an important reference for the study of charting datum and sea level change. A global MSS model named WHU2013, with 2′ × 2′ spatial resolution between 80°S and 84°N, is established in this paper by combining nearly 20 years of multi-satellite altimetric data that include Topex/Poseidon (T/P), Jason-1, Jason-2, ERS-2, ENVISAT and GFO Exact Repeat Mission (ERM) data, ERS-1/168, Jason-1/C geodetic mission data and Cryosat-2 low resolution mode (LRM) data. All the ERM data are adjusted by the collinear method to achieve the mean along-track sea surface height (SSH), and the combined dataset of T/P, Jason-1 and Jason-2 from 1993 to 2012 after collinear adjustment is used as the reference data. The sea level variations in the non-ERM data (geodetic mission data and LRM data) are mainly investigated, and a combined method is proposed to correct the sea level variations between 66°S and 66°N by along-track sea level variation time series and beyond 66°S or 66°N by seasonal sea level variations. In the crossover adjustment between multi-altimetric data, a stepwise method is used to solve the problem of inconsistency in the reference data between the high and low latitude regions. The proposed model is compared with the CNES-CLS2011 and DTU13 MSS models, and the standard derivation (STD) of the differences between the models is about 5 cm between 80°S and 84°N, less than 3 cm between 66°S and 66°N, and less than 4 cm in the China Sea and its adjacent sea. Furthermore, the three models exhibit a good agreement in the SSH differences and the along-track gradient of SSH following comparisons with satellite altimetry data
- …